I. INTRODUCTION

Diabetes is a group of metabolic conditions that are characterized by high blood sugar1. The two main forms of diabetes are Type 1– the inability of the body to make enough insulin (type 1) – or type 2 diabetes – the inability of glucose consuming organs to properly respond to insulin1.
Individuals with diabetes are at high risk for developing high blood pressure, high cholesterol, and high triglycerides, which in turn can eventually lead to heart disease, stroke, blindness, kidney failure. Diabetes is also associated with increased risk of dementia and Alzheimer’s disease2 as well as cancers such as liver, pancreatic, and breast cancer3. Thus, due to the diabetes-related high morbidity and mortality rates, the average medical costs for patients with diabetes are 2.3 times higher than the rest of the population4. Diabetes prevalance in different counties of the United States is visualized in Figure 1A.

There are various US-wide educational programs aimed at diabetes prevention. However, targeting the right population is an important roadblock5. There are large differences in diabetes incidence based on age, gender and ethnicity/race6,7. Importantly, individuals of different age, gender and racial/ethnic backgrounds are disproportionately affected by the neighborhood social vulnerability. Social vulnerability index (SVI) is used by the Center for Disease Control and Prevention (CDC) to describe the vulnerability of a specific county within the US (Figure 1B) based on factors such as the socioeconomic status, household composition and disability, minority status and language as well as housing and transportation access8. SVI has been linked to other forms of metabolic outcomes like cardiovascular mortality9. SVI is used by state and local health departments and non-profits to guide community- based health promotion initiatives1. However, the level to which SVI can affect diabetes is not known.

Figure 1: Diabetes Prevalance and Average SVI in Different Counties of United States

A. Diabetes Prevalence per County in Different States

B. SVI Averages per County in Different States

The aim of this study is to model the relationship between SVI and diabetes prevalence in different states in the USA.

II. METHODS & MATERIALS

A. Data Set
The main data was obtained from the CDC web site: https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html by setting “Year” variable to 2018 (the latest available data) and “Geography” to United States. This is a retrospective dataset, the study population are the individual counties within the states composing the USA for which percentage of diagnosed diabetes cases and SVI are available. This dataset is attached as an appendix and is labeled as “cdc_Diabet_socFctrs2018.csv”. The dataset contained 3142 observations. Two additional datasets attached with this document and is labeled as “ggmap_states.csv” and “ggmap_counties.csv”, were obtained using the ggmap package in R programming language10. The first dataset contains longitude and latitude values for individual states and the second one contains same information for individual counties. The datasets were used to create a map and visually represent that diagnosed diabetes percentage and a map of SVI of individual counties within states. To add widely-accepted abbreviations of individual states for improved plot aesthetics, a dataset was obtained from the United States Post Office web site https://about.usps.com/who-we-are/postal-history/state-abbreviations.htm with state names and the corresponding abbreviations. The dataset is also attached and labeled as “stat_name_abbrevs.csv”.

B. Study Population
The data for individual counties was filtered so that no missing values were in the SVI and percent diagnosed diabetes variables. Then the data were filtered further to ensure that at least 25 observations were available for each state to be able to use the Central Limit Theorem. The final dataset had 2965 observations and is attached as “cdc_diagDiab_final.csv”. The variable characteristics are depicted in Table 1. The data was also analyzed using histograms and boxplots. Boxplots are shown in Figure 2.

Figure 2: Average Percent Diagnosed Diabetes by State

C. Statistical Methods
To analyze the relationship between diabetes prevalence and SVI regression was performed using non-parametric Kernel Regression method with the Nadaraya-Watson equation11,12 and bootstrap 95% CI were calculated. The code for Kernel regression was written based on the formula and tested on small datasets and results were compared to the already existing ksmooth function from stats package in R (data not shown). The purpose of the code is to ensure that bandwidth can be adjusted and degrees of freedom can be obtained from the regression model.The analysis suggested an overall linear model for the relationship between diabetes prevalence and SVI. Thus, 2-way ANOVA was performed to further examine this relationship with interaction based on county demographic status (urban vs rural). The SVI variable was categorized using the estimated mean and standard deviation with values below the mean-sd indicating low SVI, values mean + sd indicating high SV, and values within one sd from the mean indicating average SVI. These were used as a factor to help explain the variances in diabetes per state.

Table 1: Summary of the Variables

Minimum Maximum Mean Median Standard Deviation Number of States Number of Counties
Diagnosed Diabetes (%) 4.5 17.9 8.742 8.400 1.799 37 1776
SVI 0.0 1.0 0.507 0.509 0.288 37 1776

III. RESULTS

Based on summary Table 1 as well as Figure 1 and Figure 2, it is apparent that there are differences in percent diagnosed diabetes based on each (county and) state. To understand whether these differences in diabetes prevalence can be influenced by SVI which differs based on a particular state (and county), the relationship between diabetes prevalence and SVI was analyzed (Supp. Fig. 1, Fig. 3).

Figure 3: Relationship between SVI and Percent Diagnosed Diabetes

Based on the Kernel Regression, the relationship between SVI and Percent Diagnosed Diabetes seems to fit the assumption about linear relationship between the two variables. The degrees of freedom was estimated to be 6.56. This result helped support the use of ANOVA technique, which allowed simplification of the model to examine whether there are meaningful differences between groups.

Table 2: Association between SVI and Diabetes Prevalance Influenced by Rural/Urban Status

A. 2-Way ANOVA Results

  Df Sum Sq Mean Sq F value Pr(>F)
SVI_Category 2 1648 824 321.1 6.047e-127
Urban_vs_Rural 1 311.9 311.9 121.6 9.863e-28
SVI_Category:Urban_vs_Rural 2 41.81 20.9 8.147 0.0002961
Residuals 2959 7592 2.566 NA NA

B. Contrats (FDR-corrected)

term contrast null.value estimate std.error df statistic adj.p.value
SVI_Category*Urban_vs_Rural low Rural - average Rural 0 -0.6937 0.1126 2959 -6.158 1.141e-09
SVI_Category*Urban_vs_Rural low Rural - high Rural 0 -2.183 0.132 2959 -16.54 2.775e-58
SVI_Category*Urban_vs_Rural low Rural - low Urban 0 -0.2708 0.1293 2959 -2.094 0.0363
SVI_Category*Urban_vs_Rural low Rural - average Urban 0 -1.545 0.1075 2959 -14.38 5.912e-45
SVI_Category*Urban_vs_Rural low Rural - high Urban 0 -2.696 0.1308 2959 -20.62 3.432e-87
SVI_Category*Urban_vs_Rural average Rural - high Rural 0 -1.49 0.1097 2959 -13.58 1.943e-40
SVI_Category*Urban_vs_Rural average Rural - low Urban 0 0.4229 0.1065 2959 3.972 7.831e-05
SVI_Category*Urban_vs_Rural average Rural - average Urban 0 -0.8512 0.07854 2959 -10.84 1.078e-26
SVI_Category*Urban_vs_Rural average Rural - high Urban 0 -2.003 0.1083 2959 -18.5 1.126e-71
SVI_Category*Urban_vs_Rural high Rural - low Urban 0 1.912 0.1268 2959 15.09 3.903e-49
SVI_Category*Urban_vs_Rural high Rural - average Urban 0 0.6384 0.1044 2959 6.115 1.364e-09
SVI_Category*Urban_vs_Rural high Rural - high Urban 0 -0.5132 0.1283 2959 -4.001 7.465e-05
SVI_Category*Urban_vs_Rural low Urban - average Urban 0 -1.274 0.101 2959 -12.62 2.543e-35
SVI_Category*Urban_vs_Rural low Urban - high Urban 0 -2.426 0.1255 2959 -19.33 1.331e-77
SVI_Category*Urban_vs_Rural average Urban - high Urban 0 -1.152 0.1029 2959 -11.19 2.678e-28

The 2-Way ANOVA confirmed that there is a significant interaction between SVI and diabetes prevalence, which is influenced by the county urban or rural status (Table 2).

V. CONCLUSIONS

Based on the analysis, SVI affected percent diagnosed diabetes values in a state-dependent manner. Furthermore, the number of rural counties within each state further affected this relationship. The Nadaraya-Watson equation further confirmed that there is a relationship between percent diagnosed diabetes and SVI was dependent on multiple factors.

V. DICUSSION

The study revealed that there is a complex relationship between percent diagnosed diabetes. This relationship varies between states and counties, especially based on whether counties are rural or urban. This is expected, since in rural counties there is higher need for physical activity, while urbanization has been linked with increased diabetes outcomes in the past3. Furthermore, the relationship between SVI and diabetes has an overall positive slope, independently from the factors influencing this relationship. This is also expected as higher social vulnerability in many forms has effects on diabetes prevalence. For instance, one component of SVI is low socio-economic status, which has been linked with diabetes incidence in the past3. One impartant limitation of this analysis is that several states were not included (Arizona, Connecticut, Delaware, District of Columbia, Hawaii, Maine, Maryland, Massachusetts, Nevada, New Hampshire, New Jersey, Rhode Island, Vermont, Wyoming) due to small amount of data available from these. In the future it would be important to adjust the analyses if more data becomes available. Another important limitation of the study is that the source of data did not differentiation between Type 1 and Type 2 diabetes. It is known that while Type 1 diabetes has mainly a genetic cause, the causes of Type 2 diabetes are usually a combination of a genetic predisposition and environmental factor. Following this, in the future it would be great to differentiate between Type 1 and Type 2 diabetes. Additionally, it would be important to start collecting and de-identifying genetic information from patients with and without diabetes, to better understand that interaction between the environmental and genetic factors for humans.

Supplementary Figure 1: The Diabetes Prevalance and SVI Relationship in Urban vs. Rural Areas

REFERENCES

  1. Centers for Disease Control and Prevention . Division of Diabetes Translation website, “Diabetes Report Card.,” https://www.cdc.gov/diabetes/library/reports/reportcard.html. Accessed Dec 15, 2021.
  2. Dolan C, Glynn R, Griffin S, et al. “Brain complications of diabetes mellitus: a cross- sectional study of awareness among individuals with diabetes and the general population in Ireland”. Diabet Med. 2018;35(7):871–879.
  3. Giovannucci E, Harlan DM, Archer MC, et al. “Diabetes and cancer: a consensus report”. Diabetes Care. 2010;33(7):1674–1685.
  4. American Diabetes Association. “Economic costs of diabetes in the U.S in 2017”. Diabetes Care. 2018;41: 917–928.
  5. Benoit SR, Hora I, Albright AL, Gregg EW. “New directions in incidence and prevalence of diagnosed diabetes in the USA”. BMJ Open Diab Res Care. 2019;7:e000657.
  6. Dabelea D, Mayer-Davis EJ, Saydah S, et al. “Prevalence of type 1 and type 2 diabetes among children and adolescents from 2001 to 2009.” JAMA. 2014;311(17):1778–1786.
  7. Andes LJ, Cheng YJ, Rolka DB, Gregg EW, Imperatore G. “Prevalence of prediabetes among adolescents and young adults in the United States, 2005-2016.” JAMA Pediatrics. 2019: e194498.
  8. Flanagan BE, Hallisey EJ, Adams E, & Lavery A. “Measuring Community Vulnerability to Natural and Anthropogenic Hazards: The Centers for Disease Control and Prevention’s Social Vulnerability Index.” Journal of Environmental Health. 2018: 80(10), 34-36.
  9. Khan SU, Javed Z, Lone AN, Dani SS, Amin Z, Al-Kindi SG, Virani SS, Sharma G, Blankstein R, Blaha MJ, Cainzos-Achirica M, & Khurram Nasir K. “Social Vulnerability and Premature Cardiovascular Mortality Among US Counties, 2014 to 2018.” Circulation. 2021: 144:1272–1279
  10. Kahle D and Wickham F. “ggmap: Spatial Visualization with ggplot2.” The R Journal. 2013 5(1), 144-161.
  11. Nadaraya, EA. "On Estimating Regression". Theory of Probability and Its Applications. 1964: 9 (1): 141–2.
  12. Watson GS. "Smooth regression analysis". Sankhyā: The Indian Journal of Statistics, Series A. 1964: 26 (4): 359–372.